I load in the FCS files of the Cryostem patients, and the matrix that contain clinical informations about these patients.
I also extract informations at the couple level, by comparing donors and recipients from the same couple, to extract two new features: - gender compatibility (donor gender -> recipient gender) - age of the recipient when he/ she recieved the graft
Here’s a sample of the resulting table:
## COUPLENUMBER Gender DOG DOB GROUP gender_comp
## D4147 1 M 2015-07-23 1960-03-31 non_tolerant MF
## R4147 1 F 2015-07-23 1962-02-02 non_tolerant MF
## D3574 10 F 2015-04-10 1950-09-17 primary_tolerant FF
## R3574 10 F 2015-04-10 1968-11-26 primary_tolerant FF
## D557 11 F 2013-05-29 1982-01-16 primary_tolerant FF
## R557 11 F 2013-05-29 1980-07-12 primary_tolerant FF
## age_recip
## D4147 19529
## R4147 19529
## D3574 16936
## R3574 16936
## D557 12009
## R557 12009
We re-use the FlowSOM backbone that was generated on the St Louis data, and we now map the Cryostem patients to that backbone. The backbone consists of 40 metaclusters:
These metaclusters were manually annotated by Laetitia Dubouchet:
We can then generate patients profiles, ie, for every patient, the percentage of his/her cells that mapped to the different FlowSOM metaclusters. These percentages will be used as features for the rest of the analysis.
A significant amount of patients had only CD19 positive cells, which caused these patients to have significant amounts of cells that didn't match the St Louis FlowSOM backbone. We therefore decided to remove CD19 from the analysis and not to use it to map the Cryostem patients to the St Louis FlowSOM backbone.
We then visualise the donors and recipients on a same PCA, which was built on the patient's cell profiles as identified in the FlowSOM map above. We measure the distances between donors and recipients from the same couple on the PCA, that we represented here as lines linking donors to recipients. The color of these lines corresponds to the recipients' tolerance group
After defining the patients phenotypic profiles in 40 metaclusters, we also investigate the expression of the functional markers in these patients' cells. We identify the percentage of cells expressing specific functional markers in FlowSOM's metaclusters, based on the FlowJo workspaces of Laetitia (she has defined positivity thresholds in the patients for all of these markers)
## /41BB+.1 /41BB+.2 /41BB+.3 /41BB+.4 /41BB+.5
## D2854 0.002887871 0.001895375 0.000000000 0.002315836 0.003853565
## D3209 0.007259241 0.005369897 0.003484321 0.003917180 0.011198028
## D3594 0.004463770 0.002489535 0.003588517 0.003240178 0.003640777
## D3636 0.016920560 0.007683420 0.003630496 0.006056638 0.017437108
## D3715 0.016009429 0.014983923 0.009554140 0.009774569 0.024979898
## D3920 0.004521964 0.002179328 0.005012531 0.001232286 0.003526093
Finally, we can merge the information about the percentages in the FlowSOM metaclusters and the percentage of positive cells for the functional markers in these metaclusters into one big matrix. We will now use this matrix as the patients' profiles.
We first decide to investigate differences between patients at the couple level. For each couple, we compute the donor - recipient difference, which will tell us how the expression of each CyTOF feature has changed between the donor and recipient of a same couple.
In the following section, we identify the metalusters or functional markers that are linked to tolerance in couples. To do so, we take a two steps approach, in which we first seek differences between (primary + secondary) tolerant and non-tolerant couples, and then between primary and secondary tolerant couples.
The following method is applied to identify features that are informative regarding tolerance: For each feature, a multinomial regression model with patients' tolerance as outcome and feature as predictor is built using the stats R package on CRAN and the average AUC for all three pairwise outcome comparisons is extracted using the pROC R package on CRAN. Next the feature is permuted 1000 times and the corresponding models are updated, resulting in a permutation distribution of AUCs. If the quantile associated with the AUC of the original feature exceeds 0.9, the feature is selected. To ensure that the selected feature not only captures information related to age or gender, the above procedure is repeated with a model that, besides the feature, contains the recipients’ age and the gender compatibility between donors and recipients as additional predictors.
This is the distribution of the features quantiles:
## [1] "82 features were selected with a 0.90 threshold in the Cryostem cohort."
We can then see which of these features are commonly identified in the Cryostem and in the St louis cohort:
## [1] 27
## [1] "X30_CD4.TCM.Th2.Like...Treg..20.50.."
## [2] "X.CD24..2_CD8.TEM..47.....TEMRA..42.....TCM..4.....TSCM..3.."
## [3] "X.CD24..14_CD4.Th1.Th17.TEM.TEMRA.like..90.....DP.Th1.Th17.TEMRA.like..10.."
## [4] "X.CD24..38_B.transitional..64..."
## [5] "X.CD38..1_Conventional.DCs..100.."
## [6] "X.CD38..5_NK.cells..98.....T.cells.DN.TEMRA..2.."
## [7] "X.CD38..6_CD8.T.cells.FoxP3..CXCR3..CCR7..FAS."
## [8] "X.CD38..7_CD8.TSCM..67...CD8.TCM..33.."
## [9] "X.CD38..10_T.population.CD8...DP.CD4..FoxP3...40.70..."
## [10] "X.CD38..11_DN.T.cells...CD8.low.TEMRA"
## [11] "X.CD38..13_CD4.TCM...TEM.Th2.like.FoxP3...6.30.."
## [12] "X.CD38..14_CD4.Th1.Th17.TEM.TEMRA.like..90.....DP.Th1.Th17.TEMRA.like..10.."
## [13] "X.CD38..16_T.population.Naive.TSCM"
## [14] "X.CD38..17_CD8...87...DP..10....TEMRA..70....TSCM..30....CXCR5..B.markers"
## [15] "X.CD38..19_DN..86...CD8low.13...TSCM.like.FoxP3...30.Ã .90.."
## [16] "X.CD38..20_NK.cells..66.....Conventional.DCs...some.moDCs..34.."
## [17] "X.CD38..23_DN..86.....CD8low..13...TSCM.like"
## [18] "X.CD38..24_CD4.TCM.Th17.like"
## [19] "X.CD38..25_B.naives..90.....B.memory..10.."
## [20] "X.CD38..27_CD4.Treg"
## [21] "X.CD38..28_B.naives..91.....B.memory..8.....B.transitional..2.."
## [22] "X.CD38..31_CD4.Treg"
## [23] "X.CD38..32_CD4.Treg...B.markers"
## [24] "X.CD38..33_B.naives..100.."
## [25] "X.CD38..37_CD4.TSCM..50.....CD4.TEMRA..50.....B.markers"
## [26] "X.ICOS..29_DP.Treg"
## [27] "X.PD1..11_DN.T.cells...CD8.low.TEMRA"
Finally, we can identify the features that had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "25 genes out of the 27 common selected genes have the same behaviour in both cohorts."
We can Visualise how these features change at the couple level: I extract information of the expression of these common features in the recipients, and in the couples (donor value - recipient value), in the two cohorts. In the following figures, the percentage of cells that belong to a certain feature are represented. The recipients values are represented on the x-axis, such that recipients who had many cells corresponding to the feature are situated more to the right. The donor-recipient values are represented on the y-axis, such that couples in which the feature was more expressed in the donors than in the recipients are situated more to the top.
For each feature, the cryostem patients are on the left and the St Louis patients are on the right, for comparison between the two cohorts.
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(comm)` instead of `comm` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(feature)` instead of `feature` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
We can also visualise the selected features in a graph for the Cryostem cohort. The features that are most correlated among the patients in both cohorts are linked: The features that are overexpressed in the tolerant couples are colores in blue, the ones that are overexpressed in the non tolerant couples are colored in red.
## IGRAPH 9f512b7 UN-- 25 132 --
## + attr: name (v/c), value (e/n)
## + edges from 9f512b7 (vertex names):
## [1] X.CD24..2_CD8.TEM..47.....TEMRA..42.....TCM..4.....TSCM..3..--X.CD24..14_CD4.Th1.Th17.TEM.TEMRA.like..90.....DP.Th1.Th17.TEMRA.like..10..
## [2] X.CD38..1_Conventional.DCs..100.. --X.CD38..5_NK.cells..98.....T.cells.DN.TEMRA..2..
## [3] X.CD38..5_NK.cells..98.....T.cells.DN.TEMRA..2.. --X.CD38..10_T.population.CD8...DP.CD4..FoxP3...40.70...
## [4] X.CD38..5_NK.cells..98.....T.cells.DN.TEMRA..2.. --X.CD38..11_DN.T.cells...CD8.low.TEMRA
## + ... omitted several edges
Now that we have selected features that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these features to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the features that were kept in the two cohorts as informative when comparing tolerant and non tolerant patients, and then generate forest plots in PDFs.
We apply the same feature selection technique as we used for identifying features in tolerant versus non-tolerant couples. This is the resulting distribution of the feature quantiles:
## [1] "61 features were selected with a 0.90 threshold in the Cryostem cohort."
We can see which of these features are common to the St louis cohort:
## [1] 11
## [1] "X.CD24..35_Conventional.DCs...90.."
## [2] "X.HLADR..39_B.naive..50...B.memory..50...or.unspecified.B.cells"
## [3] "X.ICOS..9_MoDCs..99.....CD4.TEM..1.."
## [4] "X.IL10..8_DP.Treg.TSCM.like."
## [5] "X.IL10..10_T.population.CD8...DP.CD4..FoxP3...40.70..."
## [6] "X.IL10..16_T.population.Naive.TSCM"
## [7] "X.IL10..31_CD4.Treg"
## [8] "X.IL10..35_Conventional.DCs...90.."
## [9] "X.OX40..5_NK.cells..98.....T.cells.DN.TEMRA..2.."
## [10] "X.OX40..10_T.population.CD8...DP.CD4..FoxP3...40.70..."
## [11] "X.OX40..36_CD4.TSCM"
Finally, we can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "1 genes out of the 11 common selected genes have the same behaviour in both cohorts."
## [1] "X.HLADR..39_B.naive..50...B.memory..50...or.unspecified.B.cells"
We can visualise how these features change at the couple level: I extract information of the expression of these common features in the recipients, and in the couples (donor value - recipient value), in the two cohorts:
Now that we have selected features that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these features to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I build new models using the features that were kept in the two cohorts as informative when comparing primary versus secondary tolerant patients, and generate PDFs to visualise them.
We then investigate into the recipients CyTOF data specifically. I load the fcs files and the table containing clinical informations about the recipients
We can then isolate the recipients profiles, ie, for every recipient, the percentage of her/his cells that mapped to the different FlowSOM metaclusters:
We can visualise the recipients FlowSOM profiles (the percentage of cells of the patients that map to the different metaclusters) in a tSNE map, to see which recipients are most similar:
In the St Louis cohort, we had observed a specific cluster of non tolerant patients (containing R690, R598, R219, R830, ...). Here, I generate a tSNE with the recipients from both St Louis and Cryostem cohorts, to see if some of the cryostem recipients would be similar to the St Louis non tolerant specific cluster.
## Joining, by = "Id.Cryostem.R"
## Joining, by = "Id.Cryostem.R"
I can first color the recipients based on their cohort. The recipients of the same cohort do not seem to "cluster" together:
I can also color the patients per day of cytof experiment, to see how batch affected the data is: The recipients do not seem like they cluster together per batch.
The recipients of the cryostem cohort that are closest to the non tolerant specific St Louis cluster are mainly CMV positive patients.
## Warning: Removed 3 rows containing missing values (geom_point).
If we observe the metaclusters of the recipients, we see that the CMV+ Cryostem patients seem to have many CD8 TEM/ TEMRA cells, and less B naive cells, regardless of their tolerance group. They share these characteristics with the St Louis group containing R690, R598, ...
I can also color the tSNE plot per Gender compatibility and recipients age: But it doesn't look like these two features play a role here
I extract the ratios of cells that are positive for the functional markers, based on the thresholds that Laetitia has provided.
We can merge information about the percentages in the FlowSOM metaclusters and the percentage of positive cells for the functional markers in these metaclusters. This is what we use to define patient profiles in the following analyses.
We can visualise the recipients phenotypic + functional profiles by running a PCA:
## Joining, by = "Id.Cryostem.R"
We aim to apply the feature selection technique that we already described above to identify features that differ between tolerant and non tolerant recipients.
It looks like the distribution of the CyTOF features among the patients is quite right skewed, so we log2 transform it, so that the data is more fit for the models that we build.
We can then apply feature selection as was previousely described.
## [1] "74 features were selected with a 0.90 threshold in the Cryostem cohort."
We can then see which of these features are common to the St louis cohort:
## [1] "25 features were commonly found in both cohorts"
Finally, we identify the features that had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of recipients)
## [1] "24 features out of the 25 common selected features have the same behaviour in both cohorts."
## [1] "X4_CD8.Naives.CXCR3...60....CD8.naives..28...CD8.TSCM..16...CD8.TCM..7.."
## [2] "X7_CD8.TSCM..67...CD8.TCM..33.."
## [3] "X23_DN..86.....CD8low..13...TSCM.like"
## [4] "X.CD24._2_CD8.TEM..47.....TEMRA..42.....TCM..4.....TSCM..3.."
## [5] "X.CD24._4_CD8.Naives.CXCR3...60....CD8.naives..28...CD8.TSCM..16...CD8.TCM..7.."
## [6] "X.CD24._30_CD4.TCM.Th2.Like...Treg..20.50.."
## [7] "X.CD25._3_CD8.TCM.Tc2.like..76.....DP.TCM..24.."
## [8] "X.CD38._6_CD8.T.cells.FoxP3..CXCR3..CCR7..FAS."
## [9] "X.CD38._7_CD8.TSCM..67...CD8.TCM..33.."
## [10] "X.CD38._10_T.population.CD8...DP.CD4..FoxP3...40.70..."
## [11] "X.CD38._16_T.population.Naive.TSCM"
## [12] "X.CD38._17_CD8...87...DP..10....TEMRA..70....TSCM..30....CXCR5..B.markers"
## [13] "X.CD38._19_DN..86...CD8low.13...TSCM.like.FoxP3...30.Ã .90.."
## [14] "X.CD38._20_NK.cells..66.....Conventional.DCs...some.moDCs..34.."
## [15] "X.CD38._23_DN..86.....CD8low..13...TSCM.like"
## [16] "X.CD38._25_B.naives..90.....B.memory..10.."
## [17] "X.CD38._27_CD4.Treg"
## [18] "X.CD38._33_B.naives..100.."
## [19] "X.CD38._35_Conventional.DCs...90.."
## [20] "X.CTLA4._2_CD8.TEM..47.....TEMRA..42.....TCM..4.....TSCM..3.."
## [21] "X.CTLA4._10_T.population.CD8...DP.CD4..FoxP3...40.70..."
## [22] "X.CTLA4._20_NK.cells..66.....Conventional.DCs...some.moDCs..34.."
## [23] "X.CTLA4._24_CD4.TCM.Th17.like"
## [24] "X.PD1._24_CD4.TCM.Th17.like"
We can then visualise these common features in boxplots of the two cohorts:
Graph of these selected features:
Now that we have selected features that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these features to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I build new models using the four features that were kept in the two cohorts as informative when comparing primary and secondary tolerant patients and generate forest plots in PDFs.
We run feature selection to identify features of interest when comparing primary versus secondary tolerant recipients.
## [1] "50 features were selected with a 0.90 threshold in the Cryostem cohort."
We can then see which of these features are common to the St louis cohort:
## [1] "4 features were commonly found in both cohorts"
We can see how many of these features had the same behaviour in the two cohorts (ie, were over or underexpressed in the same group of patients)
## [1] "2 features out of the 4 common selected features have the same behaviour in both cohorts."
## [1] "X.CD38._16_T.population.Naive.TSCM" "X.CD38._29_DP.Treg"
We can then visualise these common features in boxplots of the two cohorts:
Now that we have selected features that seemed to play a role in tolerance that were common to both cohorts (with a threhold over 0.90), we decide to investigate into these features to see if some of them are related to the age of the recipients, to the gender compatibility between the donor and recipient, to both or to none of these two causes.
I first build new models using the four features that were kept in the two cohorts as informative when comparing primary and secondary tolerant patients and generate forest plots in PDFs.
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## quartz_off_screen
## 2